Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 440 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 31.1 KiB |
| Average record size in memory | 72.3 B |
Variable types
| NUM | 7 |
|---|---|
| CAT | 2 |
Detergents_Paper is highly correlated with Grocery | High correlation |
Grocery is highly correlated with Detergents_Paper | High correlation |
Buyer/Spender has unique values | Unique |
Reproduction
| Analysis started | 2020-09-11 02:28:43.076601 |
|---|---|
| Analysis finished | 2020-09-11 02:28:51.241504 |
| Duration | 8.16 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 440 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 220.5 |
|---|---|
| Minimum | 1 |
| Maximum | 440 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 22.95 |
| Q1 | 110.75 |
| median | 220.5 |
| Q3 | 330.25 |
| 95-th percentile | 418.05 |
| Maximum | 440 |
| Range | 439 |
| Interquartile range (IQR) | 219.5 |
Descriptive statistics
| Standard deviation | 127.1613149 |
|---|---|
| Coefficient of variation (CV) | 0.5766953055 |
| Kurtosis | -1.2 |
| Mean | 220.5 |
| Median Absolute Deviation (MAD) | 110 |
| Skewness | 0 |
| Sum | 97020 |
| Variance | 16170 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 440 | 1 | 0.2% | |
| 151 | 1 | 0.2% | |
| 140 | 1 | 0.2% | |
| 141 | 1 | 0.2% | |
| 142 | 1 | 0.2% | |
| 143 | 1 | 0.2% | |
| 144 | 1 | 0.2% | |
| 145 | 1 | 0.2% | |
| 146 | 1 | 0.2% | |
| 147 | 1 | 0.2% | |
| Other values (430) | 430 | 97.7% |
| Value | Count | Frequency (%) | |
| 1 | 1 | 0.2% | |
| 2 | 1 | 0.2% | |
| 3 | 1 | 0.2% | |
| 4 | 1 | 0.2% | |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 440 | 1 | 0.2% | |
| 439 | 1 | 0.2% | |
| 438 | 1 | 0.2% | |
| 437 | 1 | 0.2% | |
| 436 | 1 | 0.2% |
Channel
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.4 KiB |
| Hotel | |
|---|---|
| Retail |
| Value | Count | Frequency (%) | |
| Hotel | 298 | 67.7% | |
| Retail | 142 | 32.3% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 5 |
| Mean length | 5.322727273 |
| Min length | 5 |
Region
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.4 KiB |
| Other | |
|---|---|
| Lisbon | |
| Oporto |
| Value | Count | Frequency (%) | |
| Other | 316 | 71.8% | |
| Lisbon | 77 | 17.5% | |
| Oporto | 47 | 10.7% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 5 |
| Mean length | 5.281818182 |
| Min length | 5 |
Fresh
Real number (ℝ≥0)
| Distinct | 433 |
|---|---|
| Distinct (%) | 98.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12000.29773 |
|---|---|
| Minimum | 3 |
| Maximum | 112151 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 401.9 |
| Q1 | 3127.75 |
| median | 8504 |
| Q3 | 16933.75 |
| 95-th percentile | 36818.5 |
| Maximum | 112151 |
| Range | 112148 |
| Interquartile range (IQR) | 13806 |
Descriptive statistics
| Standard deviation | 12647.32887 |
|---|---|
| Coefficient of variation (CV) | 1.053917924 |
| Kurtosis | 11.53640849 |
| Mean | 12000.29773 |
| Median Absolute Deviation (MAD) | 5919.5 |
| Skewness | 2.561322752 |
| Sum | 5280131 |
| Variance | 159954927.4 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 9670 | 2 | 0.5% | |
| 3 | 2 | 0.5% | |
| 8040 | 2 | 0.5% | |
| 514 | 2 | 0.5% | |
| 18044 | 2 | 0.5% | |
| 3366 | 2 | 0.5% | |
| 7149 | 2 | 0.5% | |
| 1420 | 1 | 0.2% | |
| 4456 | 1 | 0.2% | |
| 13134 | 1 | 0.2% | |
| Other values (423) | 423 | 96.1% |
| Value | Count | Frequency (%) | |
| 3 | 2 | 0.5% | |
| 9 | 1 | 0.2% | |
| 18 | 1 | 0.2% | |
| 23 | 1 | 0.2% | |
| 37 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 112151 | 1 | 0.2% | |
| 76237 | 1 | 0.2% | |
| 68951 | 1 | 0.2% | |
| 56159 | 1 | 0.2% | |
| 56083 | 1 | 0.2% |
Milk
Real number (ℝ≥0)
| Distinct | 421 |
|---|---|
| Distinct (%) | 95.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5796.265909 |
|---|---|
| Minimum | 55 |
| Maximum | 73498 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 55 |
|---|---|
| 5-th percentile | 593.75 |
| Q1 | 1533 |
| median | 3627 |
| Q3 | 7190.25 |
| 95-th percentile | 16843.4 |
| Maximum | 73498 |
| Range | 73443 |
| Interquartile range (IQR) | 5657.25 |
Descriptive statistics
| Standard deviation | 7380.377175 |
|---|---|
| Coefficient of variation (CV) | 1.273298584 |
| Kurtosis | 24.66939775 |
| Mean | 5796.265909 |
| Median Absolute Deviation (MAD) | 2460 |
| Skewness | 4.053754849 |
| Sum | 2550357 |
| Variance | 54469967.24 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1897 | 2 | 0.5% | |
| 5139 | 2 | 0.5% | |
| 659 | 2 | 0.5% | |
| 829 | 2 | 0.5% | |
| 944 | 2 | 0.5% | |
| 2884 | 2 | 0.5% | |
| 3880 | 2 | 0.5% | |
| 1032 | 2 | 0.5% | |
| 577 | 2 | 0.5% | |
| 3199 | 2 | 0.5% | |
| Other values (411) | 420 | 95.5% |
| Value | Count | Frequency (%) | |
| 55 | 1 | 0.2% | |
| 112 | 1 | 0.2% | |
| 134 | 1 | 0.2% | |
| 201 | 1 | 0.2% | |
| 254 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 73498 | 1 | 0.2% | |
| 54259 | 1 | 0.2% | |
| 46197 | 1 | 0.2% | |
| 43950 | 1 | 0.2% | |
| 38369 | 1 | 0.2% |
| Distinct | 430 |
|---|---|
| Distinct (%) | 97.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7951.277273 |
|---|---|
| Minimum | 3 |
| Maximum | 92780 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 851.45 |
| Q1 | 2153 |
| median | 4755.5 |
| Q3 | 10655.75 |
| 95-th percentile | 24033.5 |
| Maximum | 92780 |
| Range | 92777 |
| Interquartile range (IQR) | 8502.75 |
Descriptive statistics
| Standard deviation | 9503.162829 |
|---|---|
| Coefficient of variation (CV) | 1.195174373 |
| Kurtosis | 20.91467039 |
| Mean | 7951.277273 |
| Median Absolute Deviation (MAD) | 3093.5 |
| Skewness | 3.58742869 |
| Sum | 3498562 |
| Variance | 90310103.75 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1664 | 2 | 0.5% | |
| 2405 | 2 | 0.5% | |
| 1493 | 2 | 0.5% | |
| 1563 | 2 | 0.5% | |
| 3600 | 2 | 0.5% | |
| 683 | 2 | 0.5% | |
| 2406 | 2 | 0.5% | |
| 6536 | 2 | 0.5% | |
| 10391 | 2 | 0.5% | |
| 2062 | 2 | 0.5% | |
| Other values (420) | 420 | 95.5% |
| Value | Count | Frequency (%) | |
| 3 | 1 | 0.2% | |
| 137 | 1 | 0.2% | |
| 218 | 1 | 0.2% | |
| 223 | 1 | 0.2% | |
| 245 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 92780 | 1 | 0.2% | |
| 67298 | 1 | 0.2% | |
| 59598 | 1 | 0.2% | |
| 55571 | 1 | 0.2% | |
| 45828 | 1 | 0.2% |
Frozen
Real number (ℝ≥0)
| Distinct | 426 |
|---|---|
| Distinct (%) | 96.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3071.931818 |
|---|---|
| Minimum | 25 |
| Maximum | 60869 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 25 |
|---|---|
| 5-th percentile | 136.85 |
| Q1 | 742.25 |
| median | 1526 |
| Q3 | 3554.25 |
| 95-th percentile | 9930.75 |
| Maximum | 60869 |
| Range | 60844 |
| Interquartile range (IQR) | 2812 |
Descriptive statistics
| Standard deviation | 4854.673333 |
|---|---|
| Coefficient of variation (CV) | 1.580332384 |
| Kurtosis | 54.6892807 |
| Mean | 3071.931818 |
| Median Absolute Deviation (MAD) | 1084.5 |
| Skewness | 5.907985692 |
| Sum | 1351650 |
| Variance | 23567853.17 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 744 | 2 | 0.5% | |
| 779 | 2 | 0.5% | |
| 1619 | 2 | 0.5% | |
| 364 | 2 | 0.5% | |
| 848 | 2 | 0.5% | |
| 4324 | 2 | 0.5% | |
| 937 | 2 | 0.5% | |
| 830 | 2 | 0.5% | |
| 2540 | 2 | 0.5% | |
| 402 | 2 | 0.5% | |
| Other values (416) | 420 | 95.5% |
| Value | Count | Frequency (%) | |
| 25 | 1 | 0.2% | |
| 33 | 1 | 0.2% | |
| 36 | 1 | 0.2% | |
| 38 | 1 | 0.2% | |
| 42 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 60869 | 1 | 0.2% | |
| 36534 | 1 | 0.2% | |
| 35009 | 1 | 0.2% | |
| 18711 | 1 | 0.2% | |
| 18028 | 1 | 0.2% |
| Distinct | 417 |
|---|---|
| Distinct (%) | 94.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2881.493182 |
|---|---|
| Minimum | 3 |
| Maximum | 40827 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 63.7 |
| Q1 | 256.75 |
| median | 816.5 |
| Q3 | 3922 |
| 95-th percentile | 12043.2 |
| Maximum | 40827 |
| Range | 40824 |
| Interquartile range (IQR) | 3665.25 |
Descriptive statistics
| Standard deviation | 4767.854448 |
|---|---|
| Coefficient of variation (CV) | 1.654647139 |
| Kurtosis | 19.00946434 |
| Mean | 2881.493182 |
| Median Absolute Deviation (MAD) | 715.5 |
| Skewness | 3.631850631 |
| Sum | 1267857 |
| Variance | 22732436.04 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 227 | 2 | 0.5% | |
| 311 | 2 | 0.5% | |
| 118 | 2 | 0.5% | |
| 811 | 2 | 0.5% | |
| 788 | 2 | 0.5% | |
| 153 | 2 | 0.5% | |
| 96 | 2 | 0.5% | |
| 93 | 2 | 0.5% | |
| 284 | 2 | 0.5% | |
| 955 | 2 | 0.5% | |
| Other values (407) | 420 | 95.5% |
| Value | Count | Frequency (%) | |
| 3 | 2 | 0.5% | |
| 5 | 1 | 0.2% | |
| 7 | 1 | 0.2% | |
| 9 | 1 | 0.2% | |
| 10 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 40827 | 1 | 0.2% | |
| 38102 | 1 | 0.2% | |
| 26701 | 1 | 0.2% | |
| 24231 | 1 | 0.2% | |
| 24171 | 1 | 0.2% |
Delicatessen
Real number (ℝ≥0)
| Distinct | 403 |
|---|---|
| Distinct (%) | 91.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1524.870455 |
|---|---|
| Minimum | 3 |
| Maximum | 47943 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 63.95 |
| Q1 | 408.25 |
| median | 965.5 |
| Q3 | 1820.25 |
| 95-th percentile | 4485.4 |
| Maximum | 47943 |
| Range | 47940 |
| Interquartile range (IQR) | 1412 |
Descriptive statistics
| Standard deviation | 2820.105937 |
|---|---|
| Coefficient of variation (CV) | 1.849406898 |
| Kurtosis | 170.6949393 |
| Mean | 1524.870455 |
| Median Absolute Deviation (MAD) | 637.5 |
| Skewness | 11.15158648 |
| Sum | 670943 |
| Variance | 7952997.498 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 834 | 4 | 0.9% | |
| 3 | 4 | 0.9% | |
| 548 | 3 | 0.7% | |
| 1215 | 3 | 0.7% | |
| 395 | 3 | 0.7% | |
| 610 | 3 | 0.7% | |
| 290 | 2 | 0.5% | |
| 379 | 2 | 0.5% | |
| 46 | 2 | 0.5% | |
| 750 | 2 | 0.5% | |
| Other values (393) | 412 | 93.6% |
| Value | Count | Frequency (%) | |
| 3 | 4 | 0.9% | |
| 7 | 1 | 0.2% | |
| 8 | 1 | 0.2% | |
| 11 | 1 | 0.2% | |
| 18 | 2 | 0.5% |
| Value | Count | Frequency (%) | |
| 47943 | 1 | 0.2% | |
| 16523 | 1 | 0.2% | |
| 14472 | 1 | 0.2% | |
| 14351 | 1 | 0.2% | |
| 8550 | 1 | 0.2% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| Buyer/Spender | Channel | Region | Fresh | Milk | Grocery | Frozen | Detergents_Paper | Delicatessen | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Retail | Other | 12669 | 9656 | 7561 | 214 | 2674 | 1338 |
| 1 | 2 | Retail | Other | 7057 | 9810 | 9568 | 1762 | 3293 | 1776 |
| 2 | 3 | Retail | Other | 6353 | 8808 | 7684 | 2405 | 3516 | 7844 |
| 3 | 4 | Hotel | Other | 13265 | 1196 | 4221 | 6404 | 507 | 1788 |
| 4 | 5 | Retail | Other | 22615 | 5410 | 7198 | 3915 | 1777 | 5185 |
| 5 | 6 | Retail | Other | 9413 | 8259 | 5126 | 666 | 1795 | 1451 |
| 6 | 7 | Retail | Other | 12126 | 3199 | 6975 | 480 | 3140 | 545 |
| 7 | 8 | Retail | Other | 7579 | 4956 | 9426 | 1669 | 3321 | 2566 |
| 8 | 9 | Hotel | Other | 5963 | 3648 | 6192 | 425 | 1716 | 750 |
| 9 | 10 | Retail | Other | 6006 | 11093 | 18881 | 1159 | 7425 | 2098 |
Last rows
| Buyer/Spender | Channel | Region | Fresh | Milk | Grocery | Frozen | Detergents_Paper | Delicatessen | |
|---|---|---|---|---|---|---|---|---|---|
| 430 | 431 | Hotel | Other | 3097 | 4230 | 16483 | 575 | 241 | 2080 |
| 431 | 432 | Hotel | Other | 8533 | 5506 | 5160 | 13486 | 1377 | 1498 |
| 432 | 433 | Hotel | Other | 21117 | 1162 | 4754 | 269 | 1328 | 395 |
| 433 | 434 | Hotel | Other | 1982 | 3218 | 1493 | 1541 | 356 | 1449 |
| 434 | 435 | Hotel | Other | 16731 | 3922 | 7994 | 688 | 2371 | 838 |
| 435 | 436 | Hotel | Other | 29703 | 12051 | 16027 | 13135 | 182 | 2204 |
| 436 | 437 | Hotel | Other | 39228 | 1431 | 764 | 4510 | 93 | 2346 |
| 437 | 438 | Retail | Other | 14531 | 15488 | 30243 | 437 | 14841 | 1867 |
| 438 | 439 | Hotel | Other | 10290 | 1981 | 2232 | 1038 | 168 | 2125 |
| 439 | 440 | Hotel | Other | 2787 | 1698 | 2510 | 65 | 477 | 52 |